Neural processing devices for handling real-valued inputs

ABSTRACT

A device for use in a neural processing network. The network includes a memory having a plurality of storage locations each having stored a number representing a probability. Each storage location is selectively addressible to cause the contents of the location to be read to an input of a comparator. A noise generator inputs to the comparator a random number representing noise. At an output of the comparator, an output signal appears having a first or second value depending on the values of the numbers received from the addressed storage location and the noise generator. The probability of the output signal has a given one of the first and second values determined by the number at the addressed location. The address inputs for the memory are derived from a real-to-spike frequency translator which has real values as its input vector.

BACKGROUND OF THE INVENTION

This invention relates to artificial neuron-like devices (hereinafterreferred to simply as "neurons") for use in neural processing.

One of the known ways of realising a neuron in practice is to use arandom access memory (RAM). The use of RAMs for this purpose dates backa considerable number of years. Recently, a particular form of RAM hasbeen described (see Proceedings of the First IEE InternationalConference on Artificial Neural Networks, IEE, 1989, No. 313, pp242-246) which appears to have the potential for constructing neuralnetworks which mimic more closely than hitherto the behaviour ofphysiological networks. This form of RAM is referred to as a pRAM(probabilistic random access memory). For a detailed discussion of thepRAM attention is directed to the paper identified above. However, abrief discussion of the pRAM is set out below, by way of introduction tothe invention.

The pRAM is a hardware device with intrinsically neuron-like behaviour(FIG. 1). It maps binary inputs [5] (representing the presence orabsence of a pulse on each of N input lines) to a binary output [4] (a 1being equivalent to a firing event, a 0 to inactivity). This mappingfrom {0,1}^(N) to {0,1} is in general a stochastic function. If the2^(N) address locations [3] in an N-input pRAM A are indexed by an N-bitbinary address vector u, using an address decoder [6], the output a ε{0,1} of A is 1 with probability ##EQU1## where i ε {0,1}^(N) is thevector representing input activity (and x is defined to be 1-x for anyx). The quantity α_(u) represents a probability. In the hardwarerealisation of the device α_(u) is represented as an M-bit integer inthe memory locations [3], having a value in the range 0 to 2^(M) -1 andthese values represent probabilities in the range ##EQU2## The α_(u) maybe assigned values which have a neuro-biological interpretation: it isthis feature which allows networks of pRAMs, with suitably chosen memorycontents, to closely mimic the behaviour of living neural systems. In apRAM, all 2^(N) memory components are independent random variables.Thus, in addition to possessing a maximal degree of non-linearity in itsresponse function--a deterministic (αε{0,1}^(N)) pRAM can realise any ofthe 2².spsp.N possible binary functions of its inputs--pRAMs differ fromunits more conventionally used in neural network applications in thatnoise is introduced at the synaptic rather than the threshold level; itis well known that synaptic noise is the dominant source of stochasticbehaviour in biological neurons. This noise, ν, is introduced by thenoise generator [1]. ν is an M-bit integer which varies over time and isgenerated by a random number generator. The comparator [2] compares thevalue stored at the memory location being addressed and ν. One way ofdoing this is to add the value stored at the addressed location to ν. Ifthere is a carry bit in the sum, i.e. the sum has M+1 bits, a spikerepresenting a 1 is generated on arrival of the clock pulse [7]. Ifthere is no carry bit no such spike is generated and this represents a0. It can be seen that the probability of a 1 being generated is equalto the probability represented by the number stored at the addressedlocation, and it is for this reason that the latter is referred to as aprobability. It should be noted that the same result could be achievedin other ways, for example by generating a 1 if the value of theprobability was greater than ν. It can also be noted that because pRAMnetworks operate in terms of `spike trains` (streams of binary digitsproduced by the addressing of successive memory locations) informationabout the timing of firing events is retained; this potentially allowsphenomena such as the observed phase-locking of visual neurons to bereproduced by pRAM nets, with the possibility of using such nets as partof an effective `vision machine`.

For information concerning in particular the mathematics of the pRAMattention is directed to the paper written by the present inventors inthe Proceedings of the First IEE International Conference in ArtificialNeural Networks, IEE, 1989, No. 313, pp. 242-246, the contents of whichare incorporated herein by reference.

FIG. 9 shows a simple neural network comprising two pRAMs denoted as RAM1 and RAM 2. It will be understood that for practical applications muchmore extensive networks are required, the nature of which depends on theapplication concerned. Nevertheless, the network shown in FIG. 9illustrates the basic principles. It will be seen that each pRAM has anoutput OUT and a pair of inputs denoted IN1 and IN2. Each outputcorresponds to the output [4] shown in FIG. 1. The output from RAM 1 isapplied as an input IN1 of RAM 1, and the output from RAM 2 is appliedas an input to the input IN2 of RAM 1. The output from RAM 1 is alsoapplied as an input to the input IN2 of RAM 2, and the output of RAM 2is applied as an output to the input IN1 of RAM 2. The network operatesin response to clock signals received from the circuit labelled TIMING &CONTROL.

The circuitry of RAM 1 is shown in detail in FIGS. 10A-10D. RAM 2 isidentical, except that for each reference in FIGS. 10A-10D to RAM 1there should be substituted a reference to RAM 2 and vice versa.

RAM 1 comprises a random number generator. This is of conventionalconstruction and will therefore not be described here in detail. Theembodiment shown here employs shift registers and 127 stages are used togive a sequence length of 2¹²⁷ -1. It will be noted that the randomnumber generator has an array of three EXOR gates having inputs 2, 3 and4 which can be connected to selected ones of the taps T of the shiftregisters. The taps selected in RAM 1 will be different to thoseselected in RAM 2 and appropriate selection, according to criteria wellknown to those in the art, avoids undesired correlation between therandom numbers generated by the two generators. The output of the randomnumber generator is an 8-bit random number which is fed as two 4-bitsegments to two adders which make up a comparator.

The illustrated embodiment has a memory which holds four 8-bit numbersheld at four addresses. The memory is thus addressed by 2-bit addresses.At each operation of the network the contents of the addressed storagelocation in the memory are fed to the comparator where they are added tothe random number generated at that time. The output of the comparatoris a `1` is the addition results in a carry bit and is a `0` otherwise.

The output of the comparator is fed to the output of the RAM (which islabelled OUT in FIG. 9) and also to a latch. Here it is held ready toform one bit of the next address to be supplied to the address decodervia which the memory is addressed. As can be seen by taking FIGS. 9 and10A-10D together, the other bit of the address (i.e. that supplied toinput IN2 of RAM 1) is the output of RAM 2.

FIGS. 10A-10D also show inputs labelled R1₋₋ LOAD and MEMORY DATA (FIG.10D) which enable the system to be initialised by loading data into thememory at the outset, and an input SCLK by means of which clock pulsesare supplied to RAM 1 from a clock generator (see below). Finally asregards FIGS. 10A-10D, there is an input denoted GENERATE which isconnected to the latch via an inverter gate which serves to initiate theproduction of a new output from the pRAM and allows a set of 8 SCLKpulses to occur. The clock generator shown in FIG. 11 is of conventionalconstruction and will therefore not be described in detail, itsconstruction and operation being self-evident to a man skilled in theart from the Figure. This provides a burst of 8 clock signals at itsoutput SCLK which is supplied to the timing input of each of RAM 1 andRAM 2. Each time a GENERATE pulse occurs, each of RAM 1 and RAM 2generates a new 8-bit random number (one bit for each SCLK pulse),addresses a given one of the four storage locations in its memory,compares the random number with the contents of the addressed locationwith the random number, and generates an output accordingly.

The pRAM thus far described has no learning or training rule associatedwith it. The provision of a particularly advantageous form of trainingis claimed in our copending application filed on even date herewithunder the title "Neural processing devices with learning capability."This will now be discussed.

Reinforcement training is a strategy used in problems of adaptivecontrol in which individual behavioural units (here to be identifiedwith pRAMs) only receive information about the quality of theperformance of the system as a whole, and have to discover forthemselves how to change their behaviour so as to improve this. Becauseit relies only on a global success/failure signal, reinforcementtraining is likely to be the method of choice for `on-line` neuralnetwork applications.

A form of reinforcement training for pRAMs has been devised which isfast and efficient (and which is capable, in an embodiment thereof, ofbeing realised entirely with pRAM technology). This training algorithmmay be implemented using digital or analogue hardware thus makingpossible the manufacture of self-contained `learning pRAMs`. Networks ofsuch units are likely to find wide application, for example in thecontrol of autonomous robots. Control need not be centralised; smallnets of learning pRAMs could for example be located in the individualjoints of a robot limb. Such a control arrangement would in many ways beakin to the semi-autonomous neural ganglia found in insects.

According to the invention of our copending application there isprovided a device for use in a neural processing network, comprising amemory having a plurality of storage locations at each of which a numberrepresenting a probability is stored; means for selectively addressingeach of the storage locations to cause the contents of the location tobe read to an input of a comparator; a noise generator for inputting tothe comparator a random number representing noise; means for causing toappear at an output of the comparator an output signal having a first orsecond value depending on the values of the numbers received from theaddressed storage location and the noise generator, the probability ofthe output signal having a given one of the first and second valuesbeing determined by the number at the addressed location; means forreceiving from the environment signals representing success or failureof the network; means for changing the value of the number stored at theaddressed location if a success signal is received in such a way as toincrease the probability of the successful action; and means forchanging the value of the number stored at the addressed location if afailure signal is received in such a way as to decrease the probabilityof the unsuccessful action. The number stored at the addressed locationmay be changed by an appropriate increment or decrement operation, forexample.

A preferred form of training rule is described by the equation

    Δα.sub.u (t)=ρ((a-α.sub.u)r+λ(a-α.sub.u)p)(t).δ(u-i(t))(2)

where r(t), p(t) are global success, failure signals ε {0,1} receivedfrom the environment at time t, the environmental response might itselfbe produced by a pRAM, though it might be produced by many other things.a(t) is the unit's binary output, and ρ, λ are constants. The deltafunction is included to make it clear that only the location which isactually addressed at time t is available to be modified, the contentsof the other locations being unconnected with the behaviour that led toreward or punishment at time t. When r=1 (success) the probability α_(u)changes so as to increase the chance of emitting the same value fromthat location in the future, whilst if p=1 (failure) the probability ofemitting the other value when addressed increases. The constant λrepresents the ratio of punishment to reward; a non-zero value for λensures that training converges to an appropriate set of memory contentsand that the system does not get trapped in false minima. Note thatreward and penalty take effect independently; this allows thepossibility of `neutral` actions which are neither punished or rewarded,but may correspond to a useful exploration of the environment.

BRIEF DESCRIPTION OF THE DRAWINGS

In the accompanying drawings:

FIG. 1 shows diagrammatically a pRAM, as described above;

FIG. 2 shows diagrammatically a pRAM having learning characteristics;

FIG. 3 shows an alternative pRAM having learning characteristics;

FIG. 4 shows diagrammatically a pRAM adapted to handle real-valuedinput;

FIG. 5 shows diagrammatically a pRAM having the ability to implement amore generalised learning rule than that employed in FIG. 2;

FIG. 6 shows diagrammatically a pRAM in which eligibility traces(explained below) are added to each memory location;

FIG. 7 shows how a pRAM with eligibility traces can be used to implementEquation 9(a) (for which see below);

FIG. 8 shows the further modifications needed to implement Equation 10(for which see below);

FIG. 9 shows a simple neural network using two pRAMs;

FIG. 10A, 10B, 10C and 10D taken together, show a circuit diagramshowing one of the pRAMs of FIG. 9 in detail, FIG. 10B being acontinuation of the right side of FIG. 10A, FIG. 10C being a lowercontinuation of FIG. 10A, and FIG. 10D being a right continuation ofFIG. 10C and a lower continuation of FIG. 10B and

FIG. 11 is a circuit diagram showing the timing and control circuitryused in FIG. 9.

DESCRIPTION OF THE PREFERRED EMBODIMENT

FIG. 2 shows one way in which rule (2) can be implemented in hardware.The memory contents α_(i) (t+1) are updated each clock period accordingto rule (2). The pRAM [8] is identical to the unit shown in FIG. 1 anddescribed in the text above. For a given address on the address inputs[5], an output spike is generated as described above. The terms a-α_(u)and a-α_(u) are produced using the inverter [11] and theadder/subtractors [12] where α_(u) is read from the pRAM memory port[9]. These terms are multiplied by the reward and penalty factors ρr[14] and ρλp [15] respectively using the multipliers [13]. The resultantreward/penalty increment is added to the value stored at the locationbeing addressed [9] using a further adder [12] and is then written backinto the memory using the write port [10].

The learning rule (2) achieves a close approximation to thetheoretically expected final values of the memory contents for asuitably small value of the learning rate constant ρ. However, this maylead to a lengthy time for training. To increase the training speed, ρmay be initially set to a large value and subsequently decremented ateach successive time step by a factor which vanishes suitably fast asthe number of steps increases.

The rule (2) may also be realised in hardware using a pRAM technology(FIG. 3). The advantages of this method is that multiplier circuits arenot required. However, this requires 2^(M) cycles to generate α_(i)(t+1) where M is the number of bits used to represent α_(u). It isimplementable, in this example, by an auxiliary 4-input pRAM [16] (FIG.3) with input lines carrying α_(i) (t), a(t), r(t) and p(t), (the orderof significance of the bits carried by lines going from α_(i) to p) andwith memory contents given by

    β=(0,0,0,0,ρλ,0,0,1-ρλ,0,1-ρ, ρ,1,ρλ,1-ρ,ρ,1-ρλ       (3)

Because α_(i) (t) ε[0,1], and pRAMs are neuron-like objects whichcommunicate via discrete pulses, it is necessary to use time-averaging(over a number of cycles, here denoted by R) to implement the update.The output [17] of the auxiliary pRAM [16] in each step consists of thecontents of one of two locations in pRAM [16], since a, r and p remainthe same and only α_(i) alters between 0 and 1. The output of the pRAM[16] accumulated over R time steps using the integrator [19] is theupdated memory content α_(i) (t+1)≡α_(i) (t)+Δα_(i) (t), where Δα_(i)(t) is given by (2). The memory location is updated with the integratoroutput using the write memory port [10]. It is simplest to set R=2^(M),where M is the number of bits used to represent the α_(u) 's. The stepsused in the update are

0. Set contents of M-bit register [19] to zero.

1. Record i (t) (location addressed), a(t) (using the latch [18], andr(t) and p(t) (the reward [24] and penalty [25] signals). [20]represents the `environment` which provides the reward and penaltysignals.

2. For the next R time step repeatedly address the same location i inpRAM [8] (to produce the spike train α_(i)). Let these pulses, togetherwith the recorded a, r and p, generate spikes from locations in theauxiliary pRAM [16] and accumulate these values in the integrator [19].

3. [19] now contains an M-bit approximation to α_(i) (t+1). Copy thisinto location i of pRAM [8] using port [10].

When the pRAM is implemented using analogue circuitry, [19] becomes anintegrator which is first cleared and then integrates over R time steps.The output after this period is then written into the pRAM address i.This is functionally identical to the description of the digital deviceabove.

The ability to let the learning rate, ρ, decrease with time, asdescribed in association with FIG. 2, may also be included in the methodof FIG. 3.

There are many interesting problems of adaptive control which requirereal-valued inputs. An object of the invention is to provide a modifiedpRAM which enables such inputs to be handled.

According to the present invention in its preferred form there isprovided a device for use in neural processing network comprising amemory having a plurality of storage locations at each of which a numberrepresenting a probability is stored, the memory having a plurality ofaddress lines to define a succession of storage location addresses; acomparator connected to receive as an input the contents of each of thesuccessively addressed locations; a noise generator for inputting to thecomparator a succession of random numbers, representing noise; and meansfor causing to appear at the output of the comparator a succession ofoutput signals each having a first or second value depending on thevalues of the numbers received from the addressed storage locations andthe noise generator, the probability of the output signal having a givenone of the first and second values being determined by the number at theaddressed location; characterized in that it comprises a real-number todigital converter which receives a plurality of real-valued numbers eachin the range of 0 to 1 and produces at its output a correspondingplurality of synchronised parallel pulse trains which are applied to therespective address lines of the memory to define the storage locationaddresses, the probability of a pulse representing a 1 being present inan address on a given address line being equal to the value of thereal-valued number from which the pulse train applied to that addressline was derived; and an integrator for integrating the output signalsfrom the comparator.

The device provided by the invention performs mappings from [0,1]^(N) to{0,1} using ideas of time-averaging similar to those used above toimplement the reinforcement training rule (2). It is referred to hereinas an integrating pRAM or i-pRAM, and is shown in FIG. 4. Thus areal-valued input vector [26] x ε[0,1]^(N) is approximated by thetime-average (over some period R) of successive binary input patterns iε{0,1}^(N) (by the real-to-spike-frequency translator [28]: ##EQU3##Thus, each of the lines [26] which makes up the vector carries a realvalue in the range 0 to 1. For each line [26] there is a correspondingaddress input [5], and this carries a train of pulses in which theprobability of there being, at any given instant, a pulse representing a1 is equal to the real value on the corresponding line [26]. To put thematter another way, the time average of the pulse train carried by agiven line [5] is equal to the value on the corresponding line [26]. Thepulse trains on the lines [25] are synchronised with one another. Thetranslator [28] might take various forms, and one possibility is for thetranslator [28] to be a pRAM itself.

At each time step r=1 . . . R, i(r) selects a particular location in thepRAM [8] using the address inputs [5], resulting in a binary output at[4] denoted herein as a(r). These outputs are accumulated in a spikeintegrator [19] (see FIG. 4) whose contents were reset at the start ofthis cycle. The integrator [19] comprises a counter which counts thenumber of 1's received over a fixed interval, and, if there is no lookuptable [27], for which see below, a device for generating a binary output[21] in dependence on the number counter. This device may itself operatein the manner of a pRAM with a single storage location, i.e. a randomnumber can be added to the contents of the counter and a 0 or 1generated depending on whether there is an overflow bit. After R timesteps the contents of [19] are used to generate the binary i-pRAM output[21], which is 1 with probability ##EQU4## where X_(u) =Prob(uaddressed) is the more general distribution function which replaces thedelta function on the right hand side of (1).

As an alternative to averaging over a succession of fixed intervals,each beginning where the last ended, a moving average could be used withthe output [21] being generated after the formation of each average.

For some applications it might be desirable to use a function of##EQU5## to generate the binary output a:

    Prob(a=1|x)=f(Σ)                            (6)

f might for example be a sigmoid (with threshold θ and `inversetemperature` β): ##EQU6## In this case it would be necessary toappropriately transform the contents of the integrator [19] before usingthe i-pRAM output. This might be achieved locally in hardware by alookup table, denoted by [27]. In this case the number of 1's counted bythe spike generator [19] is used not to generate a 0 or 1 at the outputof the generator [19] itself but as the address of a storage location inthe lookup table [27], with each location in the lookup table containinga 0 or 1. Thus the output of the lookup table [27] is a 0 or 1 whenaddressed by the output of the generator [19].

The i-pRAM just described can be developed further to implement ageneralised form of the training rule (2). According to rule (2), theinput of a single binary address results in the contents of the singleaddressed location being modified. However, the i-pRAM can be used toimplement a generalised form of the training rule (2) in which the inputof a real-valued number causes the contents of a plurality of locationsto be modified. This is achieved by using an address counter forcounting the number of times each of the storage locations is addressed,thus providing what will be referred to herein as a learning i-pRAM.This generalised training rule is

    Δα.sub.u (t)=ρ((a-α.sub.u)r+λ(a-α.sub.u)p)(t).X.sub.u (t)(8)

where ##EQU7## replaces the delta function in (2). Thus in the learningi-pRAM case, every location [3] is available to be updated, with thechange proportional to that address's responsibility for the ultimatei-pRAM binary output a(t) (obtained using the algorithm of equation (2).

The X_(u) 's record the frequency with which addresses have beenaccessed. A simple modification to the memory section of the pRAM(FIG. 1) allows the number of times each address is accessed to berecorded using counters or integrators [22] as shown in FIG. 5.

The X_(u) 's could also be recorded in an auxiliary N-input pRAM, andused to modify the memory contents in a similar manner to FIG. 3.However, this method takes 2^(N) times longer than that using thearchitecture of FIG. 5.

For similar reasons to those considered in connection with FIGS. 2 and3, training may be accelerated by letting the learning rate constant, ρ,have an initially high value and tend to zero with time, this beingachieved in a similar manner to that described above.

Rule (8) may be further generalised in order to deal with situations inwhich reward or punishment may arrive an indefinite number of time stepsafter the critical action which caused the environmental response. Insuch delayed reinforcement tasks it is necessary to learn path-action,rather than position-action associations. This can be done by addingeligibility traces to each memory location as shown in FIG. 6. Thesedecay exponentially where a location is not accessed, but otherwise areincremented to reflect both access frequency and the resulting i-pRAMaction. In this context, "access" means that a storage location with agiven address has been accessed, "activity" means that when the storagelocation was accessed it resulted in the pRAM firing (i.e. a 1 at itsoutput), and "inactivity" means that when the storage location wasaccessed it did not result in the pRAM firing (i.e. a 0 at its output).The trace e_(u) in counters or integrators [23] records the number ofnumbers of occasions on which there was "access and activity" for eachgiven storage location, whilst the trace f_(u) recorded in counters orintegrators [24] records the number of occasions on which there was "access and inactivity" for each given storage location (both are equallyimportant in developing an appropriate response to a changingenvironment). As in FIG. 5, counter or integrator [22] records the totalnumber of times each storage location was accessed. The eligibilitytraces are initialised to zero at the start of a task, and subsequentlyupdated so that at a time t they have the values

    e.sub.u (t)=δe.sub.u (t-1)+δa(t)X.sub.u (t)    (9a)

    f.sub.u (t)=δf.sub.u (t-1)+δa(t)X.sub.u (t)    (9b)

where δ is a selected constant, 0≦δ<1, and δ=1-δ. FIG. 7 shows themechanism whereby the eligibility trace e_(u) is updated according toequation 9a showing that this feature is hardware-realisable. Thecurrent value of e_(u) is read from the port [26] and multiplied by theeligibility trace decay rate, δ at [28] using a multiplier [13]. Thisproduct is combined using an adder [12] with the product of the pRAMoutput, a(t) [4], the access count data, X_(u) [25] and the complementof the decay rate, δ [29] before written back as e_(u) [23] using thewrite port [27]. This implements equation 9a.

Updating the f_(u) term is identical to that above except that it is theinverse of the output, a(t), which is used to implement the equation 9b.

The necessary extension of equation (8), which results in the capacityto learn about temporal features of the environment, is

    Δα.sub.u (t)=ρ((α.sub.u e.sub.u -α.sub.u f.sub.u)r+λ(α.sub.u f.sub.u -α.sub.u e.sub.u)p)(t)(10)

When δ=0, e_(u) =aX_(u), f_(u) =aX_(u), it may be seen that (10) reducesto the original learning i-pRAM training rule (8).

In addition to updating the eligibility traces (shown in FIG. 7) thememory contents, α_(u) are modified so that learning behaviour may beimplemented. FIG. 8 shows the operations required in addition to thoseof FIG. 7 in order to implement equation 10. Multiplier [31] forms theproduct of e_(u) and α_(u) and multiplier [32] forms the product off_(u) and α_(u). Multiplier [33] forms the product of e_(u) and α_(u)and multiplier [34] forms the product of f_(u) and α_(u). The productformed by multiplier [33] is subtracted from the product formed bymultiplier [32] in the subtractor [35]. The product formed by multiplier[34] is subtracted from the product formed by multiplier [31] in thesubtractor [36]. The output of the subtractor [35] is multiplied by apenalty factor p which is an input from the environment to themultiplier [37] at [39]. The output of the subtractor [36] is multipliedby a reward factor r which is an input from the environment to themultiplier [38] at [40]. The outputs of the multipliers [37] and [38]are added to the original memory contents at [19] using the adder [12].The output from the adder [12] is written back into the memory using thewrite port [10] and the memory is thereby updated. The operationsdescribed implement the training rule described in equation 10.

An alternative to the training rule of equation (8) is a rule which maytake account more realistically of the behaviour of the whole i-pRAM.This alternative is expressed by

    Δα.sub.u.sup.(i) =ρ{[α.sub.u.sup.(i) g)a.sub.i -(α.sub.u.sup.(i) g)a.sub.i ]r+λ[(α.sub.u.sup.(i) g)a.sub.i -(α.sub.u.sup.(i) g)a.sub.i ]p}X.sub.u

where g is a suitable function of ##EQU8## such as, for example,##EQU9## Where eligibility traces are added, this becomes

    Δα.sub.u.sup.(i) =ρ{[(α.sub.u.sup.(i) g)e.sub.u -(α.sub.u.sup.(i) g)f.sub.u ]r+λ[(α.sub.u.sup.(i) g)f.sub.u -(α.sub.u.sup.(i) g)e.sub.u ]p}

In the various aspects of the invention described herein, the devicesare described as being realised in dedicated hardware. It will beappreciated that the invention can alternatively be realised insoftware, using a conventional digital computer to simulate the hardwaredescribed, and the present application is intended to encompass thatpossibility. However, software simulation is unlikely to be practicalexcept for very small networks and the hardware approach is much morepractical for larger and therefore more interesting networks.

Also is should be noted that other hardware realisations are possible,for example using VLSI technology.

We claim:
 1. A device for use in a neural processing network, comprisinga memory having a plurality of storage locations at each of which anumber representing a probability is stored; means for selectivelyaddressing each storage location of said plurality of storage locationsas an addressed storage location to cause the contents of the addressedstorage location to be written to a comparator; a noise generator forinputting to the comparator a random number representing noise; meansfor causing to appear at an output of the comparator an output signalhaving a first or second value depending on the values of the numbersreceived from the addressed storage location and the noise generator,the probability of the output signal having a given one of the first andsecond values being determined by the number at the addressed storagelocation; and a plurality of trace counters each corresponding to arespective one of the storage locations, said trace counters each havingcontents that are subject to an increment on each occasion when both thecorresponding storage location is accessed and the output signal of thecomparator has the first value, the addressed storage locations storingvalues of numbers that are increased or decreased in dependence on thecontents of said trace counters.
 2. A device according to claim 1,wherein the contents of said trace counters are subject to a decayfactor each time any of the storage locations is accessed.
 3. A deviceaccording to claim 2, comprising an address counter for each of thestorage locations and for counting the number of times each respectiveone of the storage locations is addressed.
 4. A device according toclaim 3, wherein each of said trace counters has as the contents e_(u)(t) at time t given by

    e.sub.u (t)=δe.sub.u (t-1)+δa(t)X.sub.u (t)

where δ is a selected constant, 0≦δ<1, and δ=1-δ, a is the output of thecomparator and X_(u) (t) is the frequency with which location u isaddressed, as counted by the respective address counter.
 5. A deviceaccording to claim 1, wherein said trace counters comprise first tracecounters, said device further comprising a plurality of second tracecounters each corresponding to a respective one of the storagelocations, said second trace counters each having contents subject to anincrement on each occasion when both the corresponding storage locationis accessed and the output signal of the comparator has the secondvalue, the addressed storage locations storing values of numbers thatare increased or decreased in dependence on the contents of said firstand second trace counters.
 6. A device according to claim 5, wherein thecontents of the said second trace counters are subject to a decay factoreach time any of the storage locations is accessed.
 7. A deviceaccording to claim 6, comprising an address counter for each storagelocation and for counting the number of times each respective one of thestorage locations is addressed.
 8. A device according to claim 7,wherein the contents f_(u) (t) of said second trace counters at time tare given by

    f.sub.u (t)+δf.sub.u (t-1)+δa(t)X.sub.u (t)

where δ is a selected constant, 0≦δ<1, and δ=1-δ, a is the output of thecomparator and X_(u) (t) is the frequency with which location u isaccessed, as counted by a respective one of the address counters.
 9. Adevice according to claim 8, wherein the increase or decrease, Δα_(u)(t), in the value of the number stored at a given location is given bythe equation:

    Δα.sub.u (t)=ρ((α.sub.u e.sub.u -α.sub.u f.sub.u)r+λ(α.sub.u f.sub.u -α.sub.u e.sub.u)p)(t)

where r and p are reward and penalty factors respectively and ρ is aconstant.
 10. A device according to claim 1, in which said memoryincludes address lines and further comprising a real-number to digitalconverter which receives a plurality of real-valued numbers each in therange of 0 to 1 and produces at its output a corresponding plurality ofparallel pulse trains which are applied to respective ones of theaddress lines of the memory to define a succession of storage locationaddresses, the probability of a pulse representing a 1 being present inan address on a given address line being equal to the value of thereal-valued number from which the pulse train applied to that addressline was derived; and an integrator for integrating the output signalfrom the comparator.
 11. A device according to claim 10, furthercomprising an output generator connected to the integrator and having anoutput at which a given one of two values appears as a function of anintegrated value produced by the integrator.
 12. A device according toclaim 11, wherein the output generator contains a look-up table forgenerating the given one of the two values as a function of theintegrated value produced by the integrator.
 13. A device according toclaim 10, wherein the random number and the numbers at the storagelocations have the same number of bits, and wherein the comparator isoperable to add the values of the random number received and the numberreceived from one of the addressed storage locations, the output signalhaving the said first or second value depending on whether or not theaddition results in an overflow bit.
 14. A device according to claim 1,comprising means for receiving signals from an environment, the signalsrepresenting success or failure of the network; means for changing thevalue of the number stored at the addressed storage location if asuccess signal is received, in such a way as to increase the probabilityof a successful action; and means for changing the value of the numberstored at the addressed storage location if a failure signal isreceived, in such a way as to decrease the probability of anunsuccessful action.
 15. A device according to claim 1, wherein thememory is a random access memory.
 16. A device for use in a neuralprocessing network, comprising a memory having address lines and aplurality of storage locations at each of which a number representing aprobability is stored; a real-number to digital converter which receivesa plurality of real-valued numbers each in the range of 0 to 1 andproduces at its output a corresponding plurality of synchronizedparallel pulse trains which are applied to respective address lines ofthe memory as a succession of storage location addresses, thereby toselectively address each storage location of said plurality of storagelocations as an addressed storage location the probability of a pulserepresenting a 1 being present in an address on a given address linebeing equal to the value of the real-valued number from which wasderived an applied one of the pulse trains to that address line; acomparator for successively receiving as an input the numbers stored ateach of the addressed storage locations; a noise generator for inputtingto the comparator a succession of random numbers, representing noise;means for causing to appear at an output of the comparator a successionof output signals each having a first or second value depending on thevalues of the numbers received from the addressed storage locations andthe noise generator, the probability of the output signal having a givenone of the first and second values being determined by the number at oneof the addressed storage locations; an integrator for integrating theoutput signals from the comparator; means for receiving signals from anenvironment, the signals representing a successful action or anunsuccessful action by the network; means for changing the value of thenumbers stored at the addressed storage locations if a success signal isreceived, in such a way as to increase the probability of the successfulaction; means for changing the value of the number stored at theaddressed storage locations if a failure signal is received, in such away as to decrease the probability of the unsuccessful action; anaddress counter for counting the number of times each of the storagelocations is addressed; and means for increasing or decreasing thevalues of the numbers stored at the addressed storage locations independence on the number of times each of the storage locations isaddressed, as counted by the address counter; and two further countersfor each of the storage locations, one of said further counters havingcontents subject to an increment on each occasion when the storagelocation is addressed and the output signal of the device has the firstvalue, and the other of said further counters having contents subject toan increment on each occasion when the storage location is addressed andthe output signal of the device has the second value, the contents ofboth said further counters being subject to a decay factor each time anystorage location is accessed, the values of the numbers stored at theaddressed storage locations being increased or decreased in dependenceon the contents of said further counters.
 17. A device according toclaim 16, wherein the contents e_(u) (t) and f_(u) (t) of the saidfurther counters at time t are given by

    e.sub.u (t)=δe.sub.u (t-1)+δa(t)X.sub.u (t)

    f.sub.u (t)=δf.sub.u (t-1)+δa(t)X.sub.u (t)

where δ is a selected constant, 0≦δ<1, δ=1-δ, a is the output of thecomparator and X_(u) (t) is the frequency with which location u isaddressed, as counted by the respective address counter.
 18. A deviceaccording to claim 17, wherein the increase or decrease, Δα_(u) (t), inthe value of the number stored at a given location is given by theequation:

    Δα.sub.u (t)=ρ((α.sub.u e.sub.u -α.sub.u f.sub.u)r+λ(α.sub.u f.sub.u -α.sub.u e.sub.u)p)(t)

where r and p are reward and penalty factors respectively and ρ is aconstant.
 19. A device according to claim 18, wherein the memory is arandom access memory.
 20. A device according to claim 16, wherein thememory is a random access memory.